Introduction

This session is adapted from the Data Carpentry lesson “Data visualization with ggplot2”: https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html

The examples in this session show how to generate and customize plots using ggplot2 in R, using an ecology dataset containing survey data on animal species diversity and weights within a study site.

Example

Download and load data

We download and load the dataset in .csv format from GitHub. The saved .csv file on GitHub includes all preprocessing from the earlier chapters in the Data Carpentry materials.

Code for the preprocessing steps is also included in the previous script, available on GitHub.

library(tidyverse)
library(here)
library(ggplot2)
library(hexbin)
# download saved dataset from GitHub
download.file(
  url = "https://raw.githubusercontent.com/lmweber/PHDS-data-visualization-2024/main/data/surveys_complete.csv", 
  destfile = here("data/surveys_complete.csv")
)
# load data
surveys_complete <- read_csv(here("data/surveys_complete.csv"))
## Rows: 30463 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): species_id, sex, genus, species, taxa, plot_type
## dbl (7): record_id, month, day, year, plot_id, hindfoot_length, weight
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Have a look at the dataset.

head(surveys_complete)
## # A tibble: 6 × 13
##   record_id month   day  year plot_id species_id sex   hindfoot_length weight
##       <dbl> <dbl> <dbl> <dbl>   <dbl> <chr>      <chr>           <dbl>  <dbl>
## 1       845     5     6  1978       2 NL         M                  32    204
## 2      1164     8     5  1978       2 NL         M                  34    199
## 3      1261     9     4  1978       2 NL         M                  32    197
## 4      1756     4    29  1979       2 NL         M                  33    166
## 5      1818     5    30  1979       2 NL         M                  32    184
## 6      1882     7     4  1979       2 NL         M                  32    206
## # ℹ 4 more variables: genus <chr>, species <chr>, taxa <chr>, plot_type <chr>

Scatter plot

Generate an initial scatter plot to visualize the data.

Note the syntax used for the ggplot() function. The first argument (data) provides the input data frame, and the second argument (mapping = aes()) specifies the mapping of data variables to plot aesthetics (in this case the x and y axes), followed by the + operator to provide additional functions, and the geom_point() function to specify that we want to plot the data as points.

# generate an initial scatter plot
ggplot(surveys_complete, aes(x = weight, y = hindfoot_length)) + 
  geom_point()

Alternative geom

We can also specify a different geom (geometric object) to plot the data in a different way.

This demonstrates how easily ggplot allows us to change the type of plot, once we have provided the data and specified the mapping of data variables to plot aesthetics.

# 'geom_hex()'
ggplot(surveys_complete, aes(x = weight, y = hindfoot_length)) + 
  geom_hex()

Modifying the plot

We can modify the plot by providing additional arguments.

For example, the alpha argument specifies a level of transparency of points.

# 'alpha' argument
ggplot(surveys_complete, aes(x = weight, y = hindfoot_length)) + 
  geom_point(alpha = 0.1)

The color argument sets colors.

# 'color' argument
ggplot(surveys_complete, aes(x = weight, y = hindfoot_length)) + 
  geom_point(alpha = 0.1, color = "blue")

We can also use the color argument within aes() to color points by the values of a variable. Note that we can specify color within aes() either at the top level (within the ggplot2() function) or within the geom_point() function. This gives flexibility for setting colors only for certain geoms or for the whole plot.

# set 'color' using categorical variable

ggplot(surveys_complete, aes(x = weight, y = hindfoot_length)) + 
  geom_point(alpha = 0.1, aes(color = species_id))

ggplot(surveys_complete, aes(x = weight, y = hindfoot_length, 
                             color = species_id)) + 
  geom_point(alpha = 0.1)

Boxplots

Generate boxplots using geom_boxplot().

# boxplots
ggplot(surveys_complete, aes(x = species_id, y = weight)) + 
  geom_boxplot()

Add points and additional formatting arguments to the boxplots. Note the order in which the plot elements are added affects the output.

# boxplots with points and additional formatting
ggplot(surveys_complete, aes(x = species_id, y = weight)) + 
  geom_boxplot(outlier.shape = NA) + 
  geom_jitter(alpha = 0.3, color = "tomato")

# note order of plot elements
ggplot(surveys_complete, aes(x = species_id, y = weight)) + 
  geom_jitter(alpha = 0.3, color = "tomato") + 
  geom_boxplot(outlier.shape = NA)

# additional formatting
ggplot(surveys_complete, aes(x = species_id, y = weight)) + 
  geom_jitter(alpha = 0.3, color = "tomato") + 
  geom_boxplot(outlier.shape = NA, fill = NA)

Violin plots

Alternatively, we can use violin plots. (Which of these alternative visualizations seems to be the most effective for this dataset?)

# violin plots
ggplot(surveys_complete, aes(x = species_id, y = weight)) + 
  geom_violin()

ggplot(surveys_complete, aes(x = species_id, y = weight)) + 
  geom_violin(fill = "#40E0D0")

Line plots

To demonstrate line plots for time series data, we first calculate an additional variable.

# calculate number of counts per year for each genus
yearly_counts <- 
  surveys_complete |> 
  count(year, genus)

head(yearly_counts)
## # A tibble: 6 × 3
##    year genus               n
##   <dbl> <chr>           <int>
## 1  1977 Chaetodipus         3
## 2  1977 Dipodomys         222
## 3  1977 Onychomys           1
## 4  1977 Perognathus        22
## 5  1977 Peromyscus          2
## 6  1977 Reithrodontomys     2

To generate line plots, we can specify group in aes(). However, this does not allow us to identify the lines.

# line plot
ggplot(yearly_counts, aes(x = year, y = n, group = genus)) + 
  geom_line()

We can use color to identify the lines. Note that setting color in aes() also automatically groups the data.

# color by 'genus' variable
ggplot(yearly_counts, aes(x = year, y = n, color = genus)) + 
  geom_line()

Facetting

In ggplot, facetting refers to splitting plots into multiple panels according to some variable. This can be very useful, and requires a special syntax.

Here, we use facet_wrap() to facet the line plots by genus.

# facet by 'genus'
ggplot(yearly_counts, aes(x = year, y = n)) + 
  geom_line() + 
  facet_wrap(vars(genus))

We can also include color. However, in this case we have some redundancy between the facetting and the colors. Depending on the dataset, this type of redundancy may either help the reader or overcomplicate things.

# facet by 'genus' and include color
ggplot(yearly_counts, aes(x = year, y = n, color = genus)) + 
  geom_line() + 
  facet_wrap(vars(genus))

Alternatively, we can use facetting and further split the lines by sex. To do this, we need to do some further data manipulation.

# split counts by sex
yearly_sex_counts <- 
  surveys_complete |> 
  count(year, genus, sex)

Now we can create a facetted plot with lines split and colored by the sex variable.

# facetted plot
ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_wrap(vars(genus))

We can also facet by multiple variables using facet_grid(). In this case, the plot is starting to get quite busy and becoming more difficult to read.

# facet by multiple variables using 'facet_grid()'
ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_grid(rows = vars(sex), cols = vars(genus))

Facets may be organized by either rows or columns.

# facet by rows with one column
ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_grid(rows = vars(genus))

# facet by columns with one row
ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_grid(cols = vars(genus))

Themes

Themes provide a convenient way to format plots in ggplot2.

For example, theme_bw() (for “black and white”) adjusts the formatting to show white backgrounds, black borders, gray gridlines, and additional default settings. There are also several other themes available.

# demonstrate themes

ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_wrap(vars(genus)) + 
  theme_bw()

ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_wrap(vars(genus)) + 
  theme_classic()

ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_wrap(vars(genus)) + 
  theme_minimal()

Additional customizations

Additional functions are available to further customize the plot. For example, we can add more informative plot titles and axis titles.

# specify plot title and axis titles
ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_wrap(vars(genus)) + 
  labs(title = "Observed genera through time", 
       x = "Year of observation", 
       y = "Number of individuals") + 
  theme_bw()

The theme() function can be used to add further customizations, such as font size. Note the special syntax using element_text().

# using 'theme()' to adjust font size
ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_wrap(vars(genus)) + 
  labs(title = "Observed genera through time", 
       x = "Year of observation", 
       y = "Number of individuals") + 
  theme_bw() + 
  theme(text = element_text(size = 16))

Numerous additional options are available for detailed customizations in ggplot.

# demonstrate additional options for 'theme()'
ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_wrap(vars(genus)) + 
  labs(title = "Observed genera through time", 
       x = "Year of observation", 
       y = "Number of individuals") + 
  theme_bw() + 
  theme(axis.text.x = element_text(color = "grey20", size = 12, angle = 90, 
                                   hjust = 0.5, vjust = 0.5), 
        axis.text.y = element_text(color = "gray20", size = 12), 
        strip.text = element_text(face = "italic"), 
        text = element_text(size = 16))

Using ‘patchwork’ for multiple panels

Here, we demonstrate the use of the patchwork package to arrange multiple plots in panels.

First, we load the patchwork package.

library(patchwork)

The patchwork package uses a specific syntax (| for columns, / for rows).

We start by creating some plots and storing them by assigning them to variables.

# create plots and assign to variables

plot_weight <- ggplot(surveys_complete, aes(x = species_id, y = weight)) + 
  geom_boxplot() + 
  labs(x = "Species", y = expression(log[10](Weight))) + 
  scale_y_log10()

plot_count <- ggplot(yearly_counts, aes(x = year, y = n, color = genus)) + 
  geom_line() + 
  labs(x = "Year", 
       y = "Abundance")

Now, we can display the plots in panels using the patchwork syntax.

# display plots in rows
plot_weight / plot_count

# modify row heights
plot_weight / plot_count + 
  plot_layout(heights = c(3, 2))

# display plots in columns
plot_weight | plot_count

Export plots

Finally, we can export or save plots using the ggsave() function. Usually we will save plots in either .png or .pdf format. The ggsave() function detects the format automatically from the file name.

# generate plot
my_plot <- ggplot(yearly_sex_counts, aes(x = year, y = n, color = sex)) + 
  geom_line() + 
  facet_wrap(vars(genus)) + 
  labs(title = "Observed genera through time", 
       x = "Year of observation", 
       y = "Number of individuals") + 
  theme_bw() +
  theme(axis.text.x = element_text(color = "gray20", size = 12, angle = 90, 
                                   hjust = 0.5, vjust = 0.5), 
        axis.text.y = element_text(color = "gray20", size = 12), 
        text = element_text(size = 16))
# display plot
my_plot

Save plot in .png format.

# save plot in .png format
ggsave(here("plots/my_plot.png"), my_plot, width = 15, height = 10)

Save plot in .pdf format.

# save plot in .png format
ggsave(here("plots/my_plot.pdf"), my_plot, width = 15, height = 10)

We can also specify the resolution for .png format. Here, we save a combined plot using patchwork and ggsave() in .png format with a customized resolution.

# save combined plot with custom resolution in .png format
plot_combined <- 
  plot_weight / plot_count + 
  plot_layout(heights = c(3, 2))

ggsave(here("plots/plot_combined.png"), plot_combined, width = 10, dpi = 300)
## Saving 10 x 5 in image